Correlation between planted trees in Berlin and speed limits¶

This projects aims to analyzes the correlation between planted trees and speed limits in Berlin. The goal is to see if there is a correlation between the two and if so, how strong it is.

Datasources¶

Datasource 1: Baumbestand - Berlin - [WFS]¶

  • Metadata URL: https://mobilithek.info/offers/-5687470862699743129
  • Data URL: https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/s_wfs_baumbestand
  • Data Type: WFS_SRVC

Planted (streets-)trees in Berlin.

Datasource 2: Tempolimits - Berlin - [WFS]¶

  • Metadata URL: https://mobilithek.info/offers/-8613064499673471355
  • Data URL: https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/s_vms_tempolimits_spatial
  • Data Type: WFS_SRVC
  • Description: Speed Limits in Berlin.

Important notes: Only speedlimits that are different from the default speed limit of 50km/h are listed. Therefor we need another datasource to get the streets with the default speed limit.

Datasource 3: Traffic Network - Berlin - [WFS]¶

  • Metadata URL: https://fbinter.stadt-berlin.de/fb/index.jsp
  • Data URL: https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/s_vms_detailnetz_spatial_gesamt
  • Data Type: WFS_SRVC
  • Description: Traffic network in Berlin.

Important notes: We use this dataset (filtered for only (car-)streets) and inject the speed limits from datasource 2 into it, defaulting to 50km/h if no speed limit is found.

Question¶

Is there a correlation between planted trees and speed limits in Berlin? If so, how strong is it and does the type of tree have an influence on the correlation?

Outline¶

  1. Install required dependencies
  2. Load the preprocessed data
  3. Visualize the data on a map of Berlin
  4. Calculate correlation between planted trees and speed limits
  5. Visualize the correlation between planted trees and speed limits
  6. Conclusion

1. Install required dependencies¶

Initially, install all required dependencies. We use requirements.txt to manage dependencies.

In [1]:
%%capture
%pip install -r requirements.txt

2. Load the preprocessed data¶

Create geopandas dataframes using the preprocessed data from the data pipeline. If some data is missing, the data pipeline will be executed again.

In [2]:
import geopandas as gpd
import os

if (not os.path.exists("data/trees.geojson") 
or not os.path.exists("data/streets.geojson") 
or not os.path.exists("data/speed_limits.geojson") 
or not os.path.exists("data/merged_data.geojson")):
    os.chdir("data")
    os.system("python data/pipeline.py")
    os.chdir("..")

trees: gpd.GeoDataFrame = gpd.read_file("data/trees.geojson")
streets: gpd.GeoDataFrame = gpd.read_file("data/streets.geojson")
speed_limits: gpd.GeoDataFrame = gpd.read_file("data/speed_limits.geojson")
merged_data: gpd.GeoDataFrame = gpd.read_file("data/merged_data.geojson")

3. Visualize the data on a map of Berlin¶

We color all tree that where mapped to a street in green and trees that where not mapped to a street in red.

In [3]:
import plotly.io as pio
import plotly.express as px
import shapely
import numpy as np

pio.renderers.default = "notebook"

# Prepare data for plotting
unique_streets = merged_data[~merged_data["elem_nr"].duplicated(keep="last")]
trees_in_merged = trees[trees["id"].isin(merged_data["id_y"])]
trees_not_in_merged = trees[~trees["id"].isin(merged_data["id_y"])]


lats = []
lons = []
ids = []
names = []
speed_limits = []

for ident, feature, name, speed_limit in zip(unique_streets.id_x, unique_streets.geometry, unique_streets.strassenname, unique_streets.speed_limit):    
    if isinstance(feature, shapely.geometry.linestring.LineString):
        linestrings = [feature]
    elif isinstance(feature, shapely.geometry.multilinestring.MultiLineString):
        linestrings = feature.geoms
    else:
        continue
    for linestring in linestrings:
        x, y = linestring.xy
        lats = np.append(lats, y)
        lons = np.append(lons, x)
        ids = np.append(ids, [ident]*len(y))
        names = np.append(names, [name]*len(y))
        speed_limits = np.append(speed_limits, [speed_limit]*len(y))
        lats = np.append(lats, None)
        lons = np.append(lons, None)
        ids = np.append(ids, None)
        names = np.append(names, None)
        speed_limits = np.append(speed_limits, None)

fig = px.line_mapbox(lat=lats, lon=lons, hover_name=names, hover_data=[ids, speed_limits], zoom=15, height=500, width=500)


scatter_trace_in_joined = px.scatter_mapbox(trees_in_merged, 
                        lat=trees_in_merged.geometry.y,
                        lon=trees_in_merged.geometry.x,
                        color_discrete_sequence=['green'],
                        hover_name="gattung_deutsch",
                        hover_data=[],)

scatter_trace_not_in_joined = px.scatter_mapbox(trees_not_in_merged, 
                        lat=trees_not_in_merged.geometry.y,
                        lon=trees_not_in_merged.geometry.x,
                        color_discrete_sequence=['red'],
                        hover_name="gattung_deutsch",
                        hover_data=[],)


for trace in scatter_trace_in_joined.data:
    fig.add_trace(trace)

for trace in scatter_trace_not_in_joined.data:
    fig.add_trace(trace)



fig.update_layout(mapbox_style="open-street-map", margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

4. Calculate correlation between planted trees and speed limits¶

In [8]:
import matplotlib.pyplot as plt

def plot_speed_to_tree_count(df):
    plt.scatter(df['speed_limit'], df['tree_count'])
    plt.xlabel('Speed Limit')
    plt.ylabel('Tree Count')
    plt.title('Correlation between Speed Limit and Tree Count')
    plt.show()


# ensure that speed_limit is a float
merged_data['speed_limit'] = merged_data['speed_limit'].astype(float)
# aggregate data by street name
aggregated_data = merged_data.groupby('strassenname').agg({'speed_limit': 'first', 'gattung_deutsch': 'count'}).reset_index()
aggregated_data.rename(columns={'gattung_deutsch': 'tree_count'}, inplace=True)


plot_speed_to_tree_count(aggregated_data)
Comments¶

As we can see in the graph above we have some outliers, which are mostly Bundesautobahnen and Bundesstraßen. They are very long and therefor accumulate a lot of trees. As we are focusing more on the inner city, we filter them out and calculate the correlation between planted trees and speed limits.

In [10]:
# Only consider Gemeindestraßen
inner_city_data = merged_data.copy()[merged_data['strassenklasse'] == 'G']

# aggregate data by street name
aggregated_inner_data = inner_city_data.groupby('strassenname').agg({'speed_limit': 'first', 'gattung_deutsch': 'count'}).reset_index()
aggregated_inner_data.rename(columns={'gattung_deutsch': 'tree_count'}, inplace=True)

plot_speed_to_tree_count(aggregated_inner_data)

Comments¶

We can see that the outliers have decreased. Still, there are some outliers left, which we can't easily filter out. We could filter out all streets that are longer than a certain threshold, but this would alter our data too much. For this analysis we will stick with this data.

In [6]:
df = inner_city_data.copy()

df_grouped = df.groupby(['speed_limit']).size().reset_index(name='tree_count')


# Pivot the DataFrame to get tree types as columns
df_pivot = df.pivot_table(index='speed_limit', columns='gattung_deutsch', aggfunc='size', fill_value=0)
 
df_merged = df_grouped.merge(df_pivot, on='speed_limit')

correlation = df_merged.corr()
speed_limit_corr = correlation[['speed_limit']]

5. Visualize the correlation between planted trees and speed limits¶

In [7]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(5, 15))  # Specify the size of your heatmap
sns.heatmap(speed_limit_corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title("Correlation Heatmap for Speed Limit")
plt.show()

Comments¶

As we can see from this graph, there is a slight correlation between planted trees and speed limits. But aggainst my expectations, the correlation is positive. This means that the more trees are planted on a street, the higher the speed limit is.